Table of content

3. Main questions

Question 1: Does the cell doubling time correlate with reduced drug sensitivity?

[Linear regression ~ Foldchange-means](#css_class3.1.1)

[Linear regression ~ Foldchange-means biomarkers](#css_class3.1.2)

[Linear regression ~ Doubling-time](#css_class3.1.3)

[Multiple regression](#css_class3.1.4)

[Table of conclusion](#css_class3.1.5)

Question 2: Does Lapatinib have a similar effect on lung cancer as Erlotinib?

[Matrix of correlations](#css_class3.2.1)

[Plot Erlotinib and Lapatinib, coloured by tissue](#css_class3.2.2)

[correlation erlotinib , lapatinib](#css_class3.2.3)

[Lung cellines](#css_class3.2.4)

[Anova](#css_class3.2.5)

Question 3: Comparing lapatinib treated breast and cns celllines

3. Main questions

Question 1: Does the cell doubling time correlate with reduced drug sensitivity?

As mentioned in our presentation, we want to create a model to predict GI50-values and thus to predict, if Lapatinib is a good choice. Therefore, we took several datasets and performed 4 regression analyses.

Linear regression ~ Foldchange-means

The first linear model tries to predict the G-50 value under the data of the Foldchange-means.

Linear regression ~ Foldchange-means biomarkers

Linear regression is not suitable here, as the mean fold-change per cell line is not a meaningful indicator in a biological context. Hence, we selected the top 20 genes to calculate our mean values. Afterwards linear regression was performed.

Linear regression ~ Doubling-time

The second linear model tries to predict the G-50 value under the data of the Doubling-time.

Multiple regression

Finally, we did a multiple regression with both datasets to predict GI50-values.

Table of conclusion

The following table shows important values.

Foldchange FoldchangeBM Doublingtime Multiple regression
R-squared-value 0.1397262 0.0061419 0.0141764 0.0099915
F-statistic 6.8216673 0.2595551 0.6614919 0.2068926
RMSE of test-dataset 0.7596728 1.0514468 1.0736433 1.0230064

The data is hard to discuss, because we used random values for our test and training dataset. Hence, different values on every run are obtained. We could use “selected values” (First 20) but this would be a kind of cheating.

Question 2: Does Lapatinib have a similar effect on lung cancer as Erlotinib?

As we read in the paper “Antitumor and antiangiogenic effect of the dual EGFR and HER-2 tyrosine kinase inhibitor lapatinib in a lung cancer model. (Diaz et al., 2010)”, lapatinib and erlotinib are said to have similar effects. To verify this, we first looked at the correlation of the drugs and then created a graph showing the difference between GI50 for a certain cell line and the middle GI50 for both drugs.

Matrix of correlations

Erlotinib and lapatinib have a strong correlation.

Plot Erlotinib and Lapatinib, coloured by tissue

The difference from the NegLogGI50 for a particular cell line and the mean NegLogGI50 is plotted here for Erlotinib and Lapatinib.

correlation erlotinib, lapatinib

## [1] 0.6528188

A Pearson correlation coefficient of ~ 0.65 confirms that these patterns are similar. One reason for this is that both erlotinib and lapatinib are EGFR inhibitors. Cell lines that were more sensitive are displayed as bars that project to the right of the mean. Cell lines that were less sensitive are displayed with bars projected to the left.

Lung cellines

To answer our more specific question, whether lapatinib also has an effect on lung cancer, we will only look at the cell lines in lung cancer.

The difference from the NegLogGI50 for cell line from lung tissue and the mean NegLogGI50 is plotted here for Erlotinib and Lapatinib.

## [1] 0.9609488

A pearson correlation coefficent of ~ 0.96 suggests that Lapatinib has a similar effect on lung cancer as Erlotinib.

Anova

selection of Lapatinib and Erlotinib treated cells

lapa<-data.frame(Metadata[which(Metadata[,'drug'] == "lapatinib"), ])
erlo<-data.frame(Metadata[which(Metadata[,'drug'] == "erlotinib"), ])
el<-right_join(lapa,erlo, by="cell")
rmv.rows = apply(el, 1, function(x) {
  sum(is.na(x))
})  # Go through each row and sum up all missing values
row.names(rmv.rows)

Create data frame with lapatinib and erlotinib data

fc<-data.frame(Treated-Untreated)
dim(fc)
## [1] 13299   819
all<-data.frame(fc[grep("lapatinib|erlotinib", colnames(fc))])
dim(all)
## [1] 13299   113

since erlotinip contains more columns than lapatinib, we have to remove these columns

all.rmv<-data.frame(all %>% select(
                    -"CCRF.CEM_erlotinib_10000nM_24h", 
                    -"HL.60_erlotinib_10000nM_24h", 
                    -"HT29_erlotinib_10000nM_24h", 
                    -"K.562_erlotinib_10000nM_24h", 
                    -"LOX_erlotinib_10000nM_24h",
                    -"SR_erlotinib_10000nM_24h",
                    -"COLO.205_lapatinib_10000nM_24h"))
dim(all.rmv)
## [1] 13299   106

Checking the rows

la<-data.frame(all.rmv[grep("lapatinib", colnames(all.rmv))])
ncol(la)
## [1] 53
er<-data.frame(all.rmv[grep("erlotinib", colnames(all.rmv))])
ncol(er)
## [1] 53
erla<-data.frame(er,la)
ncol(all.rmv) #to prove if the columns are removed
## [1] 106

Anova

drug<-c(rep('Erlotinib',53), rep('Lapatinib',53))
expression_drug<-apply(erla, MARGIN = 2, sum)
df_drug<-data.frame(expression_drug, drug)
library(ggpubr)
ggboxplot (data = df_drug, x="drug", y="expression_drug", color = "drug",    
          add = "jitter", legend = "none")+
          rotate_x_text(angle = 45)+
          stat_compare_means(method = "anova")+        # Add global annova p-value
          stat_compare_means(label = "p.signif", method = "t.test",
          ref.group = ".all.", hide.ns = TRUE)      # Pairwise comparison against all

Question 3: Comparing lapatinib treated breast and cns celllines

As a proven fact, lapatinb crosses the brain-blood barrier. Since breast cancer tends to spread and form metastasis in brain cells and lapatinib is used in anti-breast cancer therapy, we wanted to analyse our data for similar effects after lapatinib treatment in breast and cns tissue cells.

At first, we selected lapatinib response genes by creating matrices with cns- and breast-cell line fold change values.

Next, we performed a paired t-test of treated and untreated cns and breast samples, respectively, to determine statistical significant fold change values. Since we are performing multiple testing over different cell lines from breast and cns tissues, we want to adjust our p value by performing a Benjamini Hochberg correction.

#performing a paired t-test of treated and untreated samples
t_test_cns <- col_t_paired(cnsTreated, cnsUntreated, alternative = "two.sided", mu = 0,conf.level = 0.95)
t_test_breast <- col_t_paired(breastTreated, breastUntreated, alternative = "two.sided", mu = 0,conf.level = 0.95)

#obtaining Benjamini-Hochberg adjusted p-values
pval_cns <- t_test_cns$pvalue
pval_breast <- t_test_breast$pvalue

fdr_cns <- p.adjust(pval_cns, "BH")
fdr_breast <- p.adjust(pval_breast, "BH")


#obtaining mean FC values over all samples 
breastFCm <- as.numeric(colMeans(breastFC))
cnsFCm <- as.numeric(colMeans(cnsFC))
genes <- colnames(breastFC)

For visualization of the biological significance (fold change values) and the statistical significance (adjusted p-values), we plotted our data in an interactive volcano plot.

## breast volcano plot
#creating a matrix containg all needed values for plotting
diff_df_breast <- data.frame(gene = genes, Fold = breastFCm, FDR = fdr_breast)
diff_df_breast$absFold <- abs(diff_df_breast$Fold)
head(diff_df_breast)
##     gene         Fold       FDR     absFold
## 1   A1CF  0.037268413 0.8765540 0.037268413
## 2    A2M -0.032213825 0.7188608 0.032213825
## 3 A4GALT  0.006012452 0.9793436 0.006012452
## 4  A4GNT -0.053969518 0.4235638 0.053969518
## 5   AAAS  0.081656784 0.5283372 0.081656784
## 6   AACS  0.023767096 0.8115022 0.023767096
# add a grouping column; default value is "not significant"
diff_df_breast$group <- "NotSignificant"



# change the grouping for the entries with significance but not a large enough Fold change
diff_df_breast[which(diff_df_breast['FDR'] < 0.5 & (diff_df_breast['absFold']) < 0.2 ),"group"] <- "Significant"

# change the grouping for the entries a large enough Fold change but not a low enough p value
diff_df_breast[which(diff_df_breast['FDR'] > 0.5 & (diff_df_breast['absFold']) > 0.2 ),"group"] <- "FoldChange"

# change the grouping for the entries with both significance and large enough fold change
diff_df_breast[which(diff_df_breast['FDR'] < 0.5 & (diff_df_breast['absFold']) > 0.2 ),"group"] <- "Significant&FoldChange"


# Find and label the top peaks.
top_peaks_breast <- diff_df_breast[with(diff_df_breast, order(Fold, FDR)),][1:10,]
top_peaks_breast <- rbind(top_peaks_breast, diff_df_breast[with(diff_df_breast, order(-Fold, FDR)),][1:10,])


# Add gene labels for all of the top genes we found
# creating an empty list, and filling it with entries for each row in the dataframe
# each list entry is another list with named items that will be used 
a <- list()
for (i in seq_len(nrow(top_peaks_breast))) {
  m <- top_peaks_breast[i, ]
  a[[i]] <- list(
    x = m[["Fold"]],
    y = -log10(m[["FDR"]]),
    text = m[["gene"]],
    xref = "x",
    yref = "y",
    showarrow = TRUE,
    arrowhead = 0.5,
    ax = 20,
    ay = -40
  )
}

plot_breast <- plot_ly(data = diff_df_breast, x = diff_df_breast$Fold, y = -log10(diff_df_breast$FDR), type = "scatter", text = diff_df_breast$gene, mode = "markers", color = diff_df_breast$group) %>% 
  layout(title ="Volcano Plot of Lapatinib breast cancer samples", 
         xaxis = list(title="log2 Fold Change"),
         yaxis = list(title="FDR")) %>%
  layout(annotations = a)
plot_breast
###thresholds still need to be discussed
## CNS volcano plot

diff_df_cns <- data.frame(gene = genes, Fold = cnsFCm, FDR = fdr_cns)
diff_df_cns$absFold <- abs(diff_df_cns$Fold)
head(diff_df_cns)
##     gene         Fold       FDR     absFold
## 1   A1CF  0.066575311 0.5939566 0.066575311
## 2    A2M  0.038348381 0.6873009 0.038348381
## 3 A4GALT  0.000390011 0.9980719 0.000390011
## 4  A4GNT -0.018219799 0.8780106 0.018219799
## 5   AAAS  0.014723327 0.9008420 0.014723327
## 6   AACS  0.003887384 0.9870209 0.003887384
# add a grouping column; default value is "not significant"
diff_df_cns$group <- "NotSignificant"


# change the grouping for the entries with significance but not a large enough Fold change
diff_df_cns[which(diff_df_cns['FDR'] < 0.5 & (diff_df_cns['absFold']) < 0.2 ),"group"] <- "Significant"

# change the grouping for the entries a large enough Fold change but not a low enough p value
diff_df_cns[which(diff_df_cns['FDR'] > 0.5 & (diff_df_cns['absFold']) > 0.2 ),"group"] <- "FoldChange"

# change the grouping for the entries with both significance and large enough fold change
diff_df_cns[which(diff_df_cns['FDR'] < 0.5 & (diff_df_cns['absFold']) > 0.2 ),"group"] <- "Significant&FoldChange"


# Find and label the top peaks..
top_peaks_cns <- diff_df_cns[with(diff_df_cns, order(Fold, FDR)),][1:10,]
top_peaks_cns <- rbind(top_peaks_cns, diff_df_cns[with(diff_df_cns, order(-Fold, FDR)),][1:10,])


a <- list()
for (i in seq_len(nrow(top_peaks_cns))) {
  m <- top_peaks_cns[i, ]
  a[[i]] <- list(
    x = m[["Fold"]],
    y = -log10(m[["FDR"]]),
    text = m[["gene"]],
    xref = "x",
    yref = "y",
    showarrow = TRUE,
    arrowhead = 0.5,
    ax = 20,
    ay = -40
  )
}

plot_cns <- plot_ly(data = diff_df_cns, x = diff_df_cns$Fold, y = -log10(diff_df_cns$FDR),type = "scatter", text = diff_df_cns$gene, mode = "markers", color = diff_df_cns$group) %>% 
  layout(title ="Volcano Plot of Lapatinib CNS cancer samples",
         xaxis = list(title="log2 Fold Change"),
         yaxis = list(title="FDR"))%>%
  layout(annotations = a)
plot_cns

To visualize the expression patterns of the top peak genes present in both tissue types we calculated a heat map, comparing the expression profiles of the two tissue types.

# selecet top peak genes common in cns and breast tissue
tpb_comparison <- subset(top_peaks_breast, gene %in% top_peaks_cns$gene)
tpc_comparison <- subset(top_peaks_cns, gene %in% top_peaks_breast$gene)


# order common genes alphabetically
tpb_comparison <- tpb_comparison[order(tpb_comparison$gene),]
tpc_comparison <- tpc_comparison[order(tpc_comparison$gene),]


## creating heat map of FCs to compare values 
cor_mat <- cbind("breast" = tpb_comparison$Fold, "cns" = tpc_comparison$Fold)
rownames(cor_mat) <- tpb_comparison$gene
data <- read.delim


pheatmap(
  mat               = cor_mat,
  color             = magma(10),
  border_color      = "black",
  show_colnames     = TRUE,
  show_rownames     = TRUE,
  drop_levels       = TRUE,
  fontsize          = 14,
  main              = "Comparison:
  FC levels of cns and breast top peak genes"
)

# PLot

ggplot(dif.FC.BC,aes(x = gene, y = breast)) +
  geom_bar(stat = "identity", fill = "skyblue") + 
  geom_text(aes(label = round(breast, 2)), vjust = -0.5, color = "black", size = 3) + 
  coord_flip() + labs(title = "Mean graph plot of Fold Change for breast top peak genes")

ggplot(dif.FC.BC,aes(x = gene, y = cns)) +
  geom_bar(stat = "identity", fill = "skyblue") + 
  geom_text(aes(label = round(cns, 2)), vjust = -0.5, color = "black", size = 3) + 
  coord_flip() + labs(title = "Mean graph plot of Fold Change for CNS top peak genes")

At last, we investigated the expression regulation in both tissue types by calculating the pearson correlation between the fold change values of the top peak genes.

#correlation
cor(cor_mat$breast, cor_mat$cns, method = "pearson")
## [1] 0.921812

The obtained correlation value of 0.92 indicates that lapatinib treatment yields a similar effect on gene expression in breast and cns tissue cells and therefore might be possible to treat cancer in brain metastasis.